A Framework for Understanding LSI Performance
نویسنده
چکیده
In this paper we present a theoretical model for understanding the performance of LSI search and retrieval applications. Many models for understanding LSI have been proposed. Ours is the first to study the values produced by LSI in the term dimension vectors. The framework presented here is based on term co-occurrence data. We show a strong correlation between second order term co-occurrence and the values produced by the SVD algorithm that forms the foundation for LSI. We also present a mathematical proof that the SVD algorithm encapsulates term co-occurrence information.
منابع مشابه
A framework for understanding Latent Semantic Indexing (LSI) performance
In this paper we present a theoretical model for understanding the performance of Latent Semantic Indexing (LSI) search and retrieval applications. Many models for understanding LSI have been proposed. Ours is the first to study the values produced by LSI in the term dimension vectors. The framework presented here is based on term co-occurrence data. We show a strong correlation between second ...
متن کاملA Mathematical View of Latent Semantic Indexing: Tracing Term Co-occurrences
Current research in Latent Semantic Indexing (LSI) shows improvements in performance for a wide variety of information retrieval systems. We propose the development of a theoretical foundation for understanding the values produced in the reduced form of the term-term matrix. We assert that LSI’s use of higher orders of co-occurrence is a critical component of this study. In this work we present...
متن کاملTransitivity and the Co-occurrence Relation in LSI
Current research in Latent Semantic Indexing (LSI) shows improvements in performance for a wide variety of Information Retrieval systems. Researchers use experimental methods to determine the appropriate number of dimensions for a given application. We propose the development of a theoretical foundation for determination of this parameter for LSI. We assert that LSI’s use of higher orders of co...
متن کاملA resource aware distributed LSI algorithm for scalable information retrieval
Latent Semantic Indexing (LSI) is one of the popular techniques in the information retrieval fields. Different from the traditional information retrieval techniques, LSI is not based on the keyword matching simply. It uses statistics and algebraic computations. Based on Singular Value Decomposition (SVD), the higher dimensional matrix is converted to a lower dimensional approximate matrix, of w...
متن کاملA New LSI Performance Prediction Model for Interconnection Analysis of Future LSIs
As the interconnection delays control the LSI performance, the LSI performance estimation at higher design level becomes more difficult. In this paper a new LSI performance model for the estimation is described, which is made up by adopting a new clock-skew model to the SUSPENS (Stanford University System Performance Simulator) model. Using the model, it is cleared that a specific block size,...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003